Detection of OOV words using generalized word models and a semantic class language model

نویسنده

  • Thomas Schaaf
چکیده

This paper describes an approach to detect out-of-vocabulary words in spontaneous speech using a language model built on semantic categories and a new type of generalized word models consisting of a mixture of specific and general acoustic units. We demonstrate the construction of the generalized word models as replacements for surnames in a German spontaneous travel planning task GSST [1]. We show that the use of our generalized word models improves recognition accuracy in cases where out-of-vocabulary words appear and does not lead to a degradation of the overall recognition accuracy. In our experiments we measured recall and precision rates of OOV-detection which are close to their theoretic optimum. Furthermore, we compared the effect of using cross-word-triphones vs. using context-independent cross-word models. We show that when using generalized word models with cross-word-triphones, the expected number of consequential errors following an OOV word can be reduced significantly by 37%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A hierarchical language model incorporating class-dependent word models for OOV words recognition

A new language model is proposed to cope with the demands for recognizing out-of-vocabulary (OOV) words not registered in the lexicon. This language model is a class N-gram incorporating a set of word models that reflect the statistical characteristics of the phonotactics, which depend on the lexical classes. Utilization of class-dependency enhances recognition accuracy and enables identificati...

متن کامل

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Class-Based N-Gram Language Model for New Words Using Out-of-Vocabulary to In-Vocabulary Similarity

Out-of-vocabulary (OOV) words create serious problems for automatic speech recognition (ASR) systems. Not only are they missrecognized as in-vocabulary (IV) words with similar phonetics, but the error also causes further errors in nearby words. Language models (LMs) for most open vocabulary ASR systems treat OOV words as a single entity, ignoring the linguistic information. In this paper we pre...

متن کامل

Multi Class-based n-gram Language Model for New Words Using Web Data

Out-of-vocabulary (OOV) words cause a serious problem for automatic speech recognition (ASR) system. Not only it will be miss-recognized as an in-vocabulary word with similar phonetics, but the error will also affect nearby words to make errors. Language models (LMs) for most of open vocabulary ASR systems treat OOV words as one entity, ignoring the linguistic information. In this paper we pres...

متن کامل

Design and implementation of Persian spelling detection and correction system based on Semantic

Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors.  Also developing Persian tools will provide Persian progr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001